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(54) System and method for collaborative ranking 
profiles 

(57) A system for ranking search results obtained 
from an information retrieval system includes a search 
pre-processor (30), a search engine (20) and a search 
post-processor (40). The search pre-processor (30) 
determines the context of the search query by compar- 
ing the terms in the search query with a predetermined 
user context profile. Preferably, the context profile is a 
user profile or a community profile, which includes a set 



of search results employing user and group 



of terms which have been rated by the user, community, 
or a recommender system. The search engine gener- 
ates a search result comprising at least one item 
obtained from the information retrieval system. The 
search post-processor (40) ranks each item returned in 
the search result in accordance with the context of the 
search query. 
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Description 

[0001] This invention relates generally to informa- 
tion retrieval systems and more particularly, to a system 
and method of collaboratively ranking results returned 5 
from a search engine using user and group profiles. 
[0002] The World Wide Web (the "web" or "WWW") 
is an architectural framework for accessing documents 
(or web pages) stored on a worldwide network of distrib- 
uted servers called the Internet. Documents stored on w 
the Internet are defined as web pages. The architectural 
framework of the web integrates web pages stored on 
the Internet using links. Web pages consist of elements 
that may include text, graphics, images, video and 
audio. A web page, which points to the location of 15 
another web page, is said to be linked to that other web 
page. Links that are set forth in a web page usually take 
the form of a text fragment or an image. A user follows a 
link by selecting it. 

[0003] With the advent of networking technology 20 
and the World Wide Web, the ability to access informa- 
tion from external sources has greatly increased. Vari- 
ous search engines enable a user to submit a query, 
which returns a collection of items or documents. A well- 
crafted query may return a manageable set of docu- 25 
ments, typically from 30 to 50 documents. A less narrow 
query may return over 1000 documents. An overly nar- 
row query may return no documents (in which case no 
ranking is required). Various techniques are available 
for assisting the user in refining or narrowing his/her 30 
search query. However, once the search result has 
been properly narrowed, a significant problem in infor- 
mation retrieval is how to rank the results returned by 
the search engine or the combination of search 
engines. 35 
[0004] For individual search engines, there are 
many different techniques for ranking results, ranging 
from counting the frequency of the appearance of the 
various search terms in the search query to calculating 
vector similarities between a search term vector and 40 
each returned document vector. In a networked environ- 
ment such as the World Wide Web, meta-searchers 
access different and often heterogeneous search 
engines and face the additional difficulty of combining 
the ranking information returned by the individual 45 
engines. Meta-searcher is a Web information retrieval 
system aimed at searching answers to a user's query in 
the heterogeneous information providers distribute over 
the Web. When a meta-searcher receives responses 
(usually in the form of HTML files) from the information 50 
providers, a special component of a meta-searcher 
called a wrapper, process the responses to answer the 
original query. Since many search engines, including 
meta-searchers, hide the mechanism used for docu- 
ment ranking, the problem of merging search results is 55 
compounded. A problem common to both individual 
search engines and meta-search engines is that these 
approaches ignore, or knowing nothing about, the user 
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conducting the search, or the user's context for conduct- 
ing the search. 

[0005] Relevance feedback is one approach that 
elicits information about the user and his/her search 
context. Relevance feedback techniques re-rank the 
search results by using user feedback to recalculate the 
relative importance of key words in the query. While 
powerful from a technical point of view ; relevance feed- 
back approaches suffer from user interface issues. The 
relevance information required is often difficult to elicit 
successfully from users during the search process. U.S. 
Patent No. 4,996,642 to Hey, System and Method for 
Recommending Items, describes a system for providing 
recommendations to users based on others items previ- 
ously sampled by the user and the availability of the 
items. 

[0006] Knowledge Pump, a Xerox system, provides 
community-based recommendations by initially allowing 
users to identify their interests and "experts" in the 
areas of those interests. Knowledge Pump is then able 
to "push" relevant information to the users based on 
those preferences. This is accomplished by monitoring 
network traffic to create profiles of the users, including 
their interests and "communities of practice," thus refin- 
ing the community specifications. However, monitored 
or automatically created profiles for establishing context 
may not accurately reflect the user's context at all times. 
[0007] There is a need for a system and method of 
ranking search results which does not require user 
solicited relevance information. There is also a need for 
a system of ranking search results which takes into 
account a predetermined user context profile. There is 
also a need for a system and method of ranking search 
results which ranks results based on a user selected 
context. There is also a need for a system and method 
of ranking search results which takes into account a 
group or community to which the user belongs. There is 
a further need for a system and method of creating a 
user and community profile for ranking search results. 
[0008] A system for ranking search results obtained 
from an information retrieval system, according to the 
invention, includes a search pre-processor, a search 
engine and a search post-processor. The search pre- 
processor, responsive to a search query, determines 
the context of the search query by comparing the terms 
in the search query with a predetermined user context 
profile. The user's context profile may include, for exam- 
ple, the user's identity, the community or set of commu- 
nities applicable to the search, and the point of view the 
user wishes to adopt (e.g., that of a domain expert) for 
the search. Preferably, the context profile is a user pro- 
file or a community profile, which includes a hierarchical 
set of terms that have been rated by the user or commu- 
nity. Also, a recommender system may be used to gen- 
erate the user or community context profile. 
[0009] The search engine, responsive to the search 
query, generates a search result comprising at least one 
item obtained from the information retrieval system. (If 



EP 1 050 830 A2 



2 

<EP 1 050830 A2 I > 



3 



EP 1 050 830 A2 



4 



no items are returned, such as when the search is 
overly narrow, no ranking is required.) Generally, a great 
number of items will be generated, which the search 
engine will provide in its own predetermined form of 
hierarchical valuation. The search post-processor, 
responsive to a non-empty search result, ranks each 
item returned in the search result in accordance with the 
context of the search query. The ranked results may 
then be provided or displayed in any normal fashion, 
such as on a computer display or printed out. If more 
than one search engine is used, each search engine 
returns its own list of search results. The post-search 
processor then ranks all items returned, regardless of 
search engine, in accordance with the context of the 
search query. 

[0010] A method of ranking search results obtained 
from an information retrieval system, according to the 
invention, includes providing a predetermined user con- 
text profile, generating a search query, and applying the 
context profile to the search query to generate a context 
of the search query. A search is then performed based 
on the search query, which includes at least one item 
obtained from the information retrieval system. The 
search results are then ranked in accordance with the 
context of the search query. 

[0011] The system and method according to the 
invention couples a predetermined user context profile 
(e.g., user profiling, community profiling or recom- 
mender profiling) with the search process. By coupling 
context profiling with the search process, search results 
are no longer an isolated event, but are ranked within 
the context of a particular user or community or recom- 
mender system point of view. Depending on the user's 
context for the search, a different predetermined context 
profile may be selected, thus customizing the ranking of 
each particular search. 

[0012] The user and community profiles are built by 
analyzing document collections put together by the 
users and the communities to which the users belong, If 
any of the retrieved search results are considered rele- 
vant to the user or the community, they can be used to 
tune or modify the particular user or community profile 
by re-weighting the profile terms. 
[0013] User and community profiling is particularly 
useful in the invention. First, the post-processor uses a 
particular context profile (either the user's profile, the 
community's profile or another user's profile - such as a 
known expert in a domain outside the user's expertise) 
to rank the results of a search query. Preferably, a user 
profile is build from a user selecting a particular docu- 
ment collection and ranking or rating the various terms 
within the document collection. The user profile 
becomes the document collection with rating informa- 
tion attached to each document. A user can have more 
than one user context profile, or use another user's con- 
text profile in order to rank the search results most expe- 
ditiously according to a particular point of view. The 
ranked search results can be used to update the user 



profile based on new submissions or documents pro- 
duced in the search and ranked using the user's con- 
text. This approach is similar to relevance feedback. 
[0014] Similarly a community (i.e., a group of users 

5 having similar interests) profile can be built by categoriz- 
ing the documents in a document collection into the 
communities (when such a construct exists in the docu- 
ment collection) and then ranking the various docu- 
ments according to the particular users belonging to the 

w community. A user's ability to rank documents within the 
community will vary according to his/her levels of exper- 
tise. Various methods to approximate a user's level of 
expertise within a community can be used (e.g., by 
agreement, by statistics, etc.). 

15 [0015] The system of the invention provides an 
architecture that allows these methods to work together 
in support of community-based relevance feedback. 
The system and method of the invention provide the 
ability to rank results returned across multiple search 

20 engines and the ability to take into account the user's 
context through use of user, community or expert user 
profiles. 

Figure 1 illustrates an example of a distributed 
25 operating environment for performing the present 
invention; 

Figure 2 is a block diagram of a system for ranking 
search results obtained from an information 
30 retrieval system in accordance with a ^predeter- 
mined context profile. 

[0016] Referring now to the figures, Figure 1 illus- 
trates an example of a distributed operating environ- 

35 ment for performing the present invention. In the 
distributed operating environment illustrated in Figure 1 , 
client computers 102, request searches, communicate 
with other client computers and retrieve documents (i.e., 
web pages) stored on servers 1 04 for either viewing, 

40 storing, or printing. The client computers 102 are cou- 
pled to the servers 104 through Internet 106. Some cli- 
ent computers 102, which are located on an Intranet 
110, communicate indirectly with servers 104 located 
on the Internet 106 through a proxy server 1 12. The cli- 

45 ent computers 102 may consist of either workstations 
114 or laptops 116! Alternatively, the client computers 
102 may request searches, communicate with other cli- 
ent computers and retrieve documents (i.e., web pages) 
stored on Intranet servers such as proxy server 1 12 for 

so either viewing, storing, or printing. 

[0017] Referring to Figure 2, a system for ranking 
search results obtained from an .information retrieval 
system according to a predetermined user context pro- 
file is generally shown therein and referred to by 

55 numeral 1 0. System 1 0 includes search pre-processor 
30, which takes a query 102 : from a user 100 and 
applies a predetermined user context profile to deter- 
mine the context of the search query. The user context 
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profile may be a user profile generated by user profiler 
50 or a community profile generated by a community 
profiler 60. Results from the search query, which gener- 
ally include a plurality of hierarchically-ranked search 
results based on the query, are returned by the various 
search engines 20, or meta-search engine 80 by 
searching an information retrieval system (such as the 
Internet). These search results are then ranked by the 
search post-processor 40 and provided to the user in 
the form of ranked documents 124. 
[0018] Community profiler 60 ranks community 
document collection 70 in accordance with evaluations 
or rankings determined by the members of community 
130. In some cases, community manager 120 may 
determine from time to time whether a particular user 
may join or continue to be a part of the community. User 
profiler 50 ranks a selected document collection (which 
may also be the community document collection 70) in 
accordance with evaluations or ratings by user 100. 
[0019] The system may be extended to support 
community-based relevance feedback. In addition to the 
search pre-processor, one or more search engines or 
meta-search engines and search post-processor, the 
extended system may include one or more document 
collections with associated user, community/group, and 
rating attributes, a user profiler, a community profiler 
and a community manager. Additionally, the extended 
system may include wrappers that allow the profilers to 
extract document content (or document reference, such 
as its URL), user, community and rating information 
from the document collections and wrappers that allow 
the search pre-processor to submit queries to the 
search engine and the search post-processor to extract 
the results. 

[0020] The document collection may be one (or a 
combination) of several different types: documents 
residing in a document management system or a file 
system or documents referenced by a recommender 
system. In each case, the document collection provides 
a specific methodology for associating content with 
users and potentially with communities of users. In each 
case, the document collection provides the basis for 
establishing the user context profile, in that the docu- 
ment collection and user ratings establish the environ- 
ment or the interrelated conditions under which the user 
desires to rank search results. 

[0021] Preferred document collections include 
those provided by community recommender systems 
which attach user identification, community categoriza- 
tions and user ratings to the documents. Using docu- 
ment collections generated by community 
recommender systems allows use of the most sophisti- 
cated of the user and community profiling techniques 
described below. Preferably, a community-based rele- 
vance feedback system includes a recommender sys- 
tem as one of the document collections, or preferably as 
the principal one. 

[0022] An important aspect of the system for rank- 



ing is the document collection used to generate the con- 
text profile. The document collection may include an 
application program interface (API) for allowing the pro- 
filers to query for all documents submitted and/or 

5 reviewed by a user (who may be associated with a par- 
ticular community). If such an API is not provided, then 
a wrapper suitable for extracting the information may be 
used. A wrapper is a tool used by a meta-searcher that 
scans the HTML files returned by the search engine, 

w drops the markup instructions and extracts the informa- 
tion related to the query. Then the wrapper takes the 
answers from the different providers, puts them in a new 
format and generates an HTML file that can be viewed 
by the user. The API or the wrapper generates "meta- 

75 data" which is used by the profilers to construct and to 
incrementally update the user and community profiles 
from the set of documents relevant to the user and in the 
context of the community. In the case of standard docu- 
ment collections, (such as file systems or document 

20 management systems), it is generally assumed that any 
document filed or stored by the user is relevant to the 
user. In the case of a recommender system, it is gener- 
ally assumed that any document submitted or reviewed 
by a user with an average or higher rating is relevant. 

25 [0023] The search engine may include an API for 
submitting a search and retrieving results. If not, a suit- 
able wrapper may be used. The problem of query trans- 
lation across multiple, heterogeneous search engines 
and databases and the extraction of the search results 

30 is well known. Thus, any commercially available transla- 
tion and extraction product may be used. However, it 
should be noted that search engines do not necessarily 
cover the documents in the system's document collec- 
tions, although overlap is always possible. 

35 [0024] The search pre-processor determines the 
context of the user's search. The search pre-processor 
applies a predetermined context profile to search query. 
For example, the context profile may include the user's 
identity, the community or set of communities appropri- 

40 ate to the search, and the point of view the user wishes 
to adopt for the search, if any (such as that of another 
user or a domain expert). The context profile can be 
retrieved explicitly by asking the user to identify him/her- 
self and by asking the user to select the appropriate 

45 community or communities and/or point of view. Context 
can also be determined (deduced) automatically by 
matching the query with a query memory associated 
with a community (if selected) or the collection of users 
using the system. 

so [0025] A preferred context profile is that of the user. 
The user profile is created or generated by the user pro- 
filer, which constructs a term-weight vector for each 
user which is extracted from the set of documents sub- 
mitted and/or reviewed into each of the document col- 

55 lections to which the user participates. Matching a user 
across several different document collections is not 
always simple. One method of accomplishing this is to 
ask the user to provide his/her identifier (and password, 
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if needed) for each document collection. If a user with- 
holds some of this information, then his/her profile will 
be less complete than for other users who do not with- 
hold this information. However, this is not always the 
case as a user may choose only to provide access to 
particular document collections deemed appropriate by 
the user. This problem only occurs is more than one 
document collection is used in the system, which is pref- 
erably the one provided by the community recom- 
mender system. 

[0026] The term-weight vector or user profile P u is 
calculated in a standard way, although various linguis- 
tic-based enhancements are possible as noted below. 
For a user, u, the vector includes the set of terms {tj) 
with their weights wfy P u -{*,, w",-}- If the term- 
weight vector is at least calculated in part from docu- 
ments that have been evaluated (implicitly or explicitly 
rated) in some way by the user, then the ratings given to 
the documents can be used to bias the term-weight vec- 
tor. 

[0027] The user profiler may also calculate the pro- 
file of the user in the context of a community or a spe- 
cific domain or domains. In this case, the user profiler 
would take into account only those documents submit- 
ted or reviewed by the user and classified (either by the 
user or automatically) into the domain. An added diffi- 
culty in this case is matching communities/domains 
across document collections. Again, if there is only a 
single document collection, the recommender system, 
this difficulty disappears. The user profiler. provides an 
API that returns a term-weight vector in response to a 
user identification and possibly a community/domain 
identifier. 

[0028] Similarly, the community profile is created 
or generated by a community profiler. The community 
profiler constructs a term-weight vector for each com- 
munity, which is extracted from the set of documents 
classified into a community within each of the document 
collections. The term-weight vector for the community is 
determined in a way analogous to that employed for 
users. The community vector contains the set of terms 
{tj} with their weights w^,-, P° = { f w c . The weight 
of each term is calculated from the weights of the 
individual community members (users). Since the con- 
tributions of the members are frequently much different 
from one another, the community profile can be biased 
to weigh more heavily the contribution of "experts" in the 
community (special users). Experts are those commu- 
nity members whose recommendations are most fre- 
quently, followed by- the whole community. Formally, 
each member u in the community is assigned a weight 
a u . Experts have the highest a u and for the whole com- 
munity: 

. X> u = i. 

u 



The individual a u must be re-normalized whenever a 
user enters or leaves the community. Then, the weight 
of term f,- in the community profile is evaluated as: 

5 

w / = £ a u w i ■ 

u 

where Wj is the weight of f,- in the profile of user u. 

w Beyond the community and personal (user) profile, the 
user can request the profile of the community expert(s), 
which contains weight u/ 3 *?. f or each profile term The 
community profiler provides an API that returns a term- 
weight vector in response to a community identifier. 

15 [0029] When registering a new user u to commu- 
nity, the initial user profile P u = (/,, w,-) can be one of 
the following options: the community profile (f„ w,); a list 
of user defined keywords f, (the weights w, are equal or 
induced from community profile); or empty. Any docu- 

20 ment reviewed or submitted by the user changes her/his 
profile as follows. 

[0030] If (a new) document 0 is submitted, all terms 
and their associated weights are extracted from the doc- 
ument, k top-weighted terms are then selected such 

25 that a document profile is created: D = (f ,-, w' f ) , for ;= 
1,...,/c, where k is a system/application -dependent con- 
stant. Each document D reviewed and in the document 
collection has its own profile D= (f,, w' t ) . Otherwise, 
for a reviewed document, its profile D =(*',-, w',) is 

30 retrieved from the repository where it is stored along 
with the document 

[0031] The current user profile vector P:=(t jt w t ) 
and new document profile D~(t' jt w' { ) , are used as fol- 
lows to update the user profile. For each term in set {tj 
35 vj tj), we evaluate 

w" ew ~yx w,-+(1 -y )x w /( 

40 where y is a "profile conservativeness" constant, 0< y 
<1 . The closer y to 1 , the slower the profile changes with 
new submissions. Practical values of y are in range 
[0.5,0.95] and can depend on the number of user sub- 
missions (over last m days). 

45 [0032] Only the k top-weighted terms t? ew are 
chosen for the new user profile P u and their weights get 
normalized: 



[0033] When creating a community, the community 
administrator (which can be a human or a software pro- 
gram) can use for the initial community profile 
55 P c ={tj, Wj,) one of the following options: process sam- 
ple document(s) relevant to the community and extract 
terms and weights as with a user submission described 
above; use a list of community keywords tj given by 
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administrator, or leave it empty. Any document reviewed 
or submitted by a community member changes the 
member profile. Beyond the member profiles, the com- 
munity profiler maintains values of contribution a u for 
each community member u. 

[0034] For user u in the community, its contribution 
a u can be evaluated as 




w 



where r u is a number of documents submitted by the 
user (over the last m days) and f u is the number of com- is 
munity users that followed those recommendations, v is 
a customization coefficient; it may favor a user with 
numerous, but moderate, recommendations rather than 
a user with one but a popular recommendation. The 
user with the highest value a a is called the community 20 
expert and his/her profile can be used as an expert pro- 
file f£*P by other users for re- ran king the search results. 
Values a u are kept normalized such that La u =1. 
Optionally, experts can be chosen or assigned by com- 
munity members without statistical evaluation. 25 
[0035] For all members of the community, their pro- 
files P u = (t; U , Wj U ) and their contributions cc^ are 
used for updating the community profile. For each term 
tj in 

30 
35 

its weight is 

W ,= 2l a U * W i- 

u 40 



The community profile can keep all terms from its mem- 
bers' profiles or only the k top-weighted terms; in either 
case, their weights are kept normalized: Ew^ =1 . 45 
[0036] The update of the community profile P° is 
performed preferably when a minimally required 
number of user profile changes have occurred. The 
community profile update is processor time-consuming; 
thus it is preferably to update the profile off-line. 50 
[0037] In a preferred embodiment, the system 
according to the invention incorporates a recommender 
system. In addition to the storing of document profiles, 
user profiles community profiles and expert profiles, a 
recommender system would also include or provide 55 
tools for profile retrieval and profile updates. 
[0038] Matching community definitions across doc- 
ument collections and maintaining a coherent list of 



communities and users participating in those communi- 
ties is frequently difficult. If a community recommender 
system exists within the system, then its list of commu- 
nities is a likely candidate for adoption for the commu- 
nity relevance feedback system. Alternatively, an 
administrator of the system could be responsible for 
matching groups or collections in other document col- 
lections with the community list The task of constructing 
such a list from scratch would fall to the administrator in 
the absence of such a list. It is also possible to create 
automatic methods of performing the matching, 
although this may possibly reduce the accuracy of the 
community profiles. It is possible that the way in which 
the community list is constructed and matched across 
collections that the end result will be that the community 
profile is entirely determined from the data in the com- 
munity recommender system, if such exists, or other 
document collection with a notion of the community. 
[0039] It should be noted that a community profile is 
not required in order to practice the system and method 
of the invention. In some instances, it may be appropri- 
ate to take into account only the user's context, in the 
absence of the community, although such a system will 
be more difficult to accept and add new users. 
[0040] A method of ranking search results obtained 
from an information retrieval system using a predeter- 
mined user context profile would include the following 
steps. Before a user starts formulating queries, because 
of the greater benefits available from a recommender 
system, the user is assumed to have registered with a 
recommender system. This permits the system to 
upload the user profile, community profile and/or expert 
profiles chosen by the users for search result ranking 
and profile re-weighting. 

[0041] Once a user has formulated a query, the 
search pre-processor takes the query and processes 
the keywords in the query to a query profile 
P q - (f / q , Wj q ) . This profile is used by the search post- 
processor later. 

[0042] When the query is submitted to a search 
engine and the search result returns, the user sees the 
documents determined from the search query, listed or 
ranked in accordance with the algorithm provided by the 
particular search engtne(s), if any. Since this ranking wilt 
likely not rank the results within the context desired by 
the user, the user can request re-ranking. Alternatively, 
the re-ranking by the search post-processor can be 
automatic. 

[0043] The search post-processor takes as input 
the list of search results returned by the search engines. 
It has two preferred ways to evaluate the relative rank of 
the documents. One is by matching the document or its 
pointer (e.g., its URL) to one already existing in one of 
the document collections. For example, if the document 
has been rated by a community recommender system 
connected to the system within the appropriate commu- 
nity context, then this rating is given a high weight in 
determining the relative rank of the result. 
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[0044] The second preferred means of evaluating 
the relative ranking of documents is by using the profile 
term-weight vectors as a source of relevance feedback. 
Depending. on the context (i.e., user, community or 
expert), the appropriate profile is requested. 
[0045] Each document in the search result. is down- 
loaded, i.e., the full contents of each document is down- 
loaded for term extraction. The extraction is similar to 
that for submitted documents when a user context pro- 
file is being created from a document collection as 
described above. The search post-processor returns a 
document profile P d = (t d ,■, w d ,-) for each document 
d that contains k' top-weighted terms. Generally, a doc- 
ument in a search response is not considered as impor- 
tant as a submitted document in the document 
collection, thus the number /c' of terms chosen for the 
search returned document profile may be less than that 
for a submitted document. If a document in the search 
result is already in the document collection, its profile is 
not extracted from the document but is up-loaded from 
the document storage. 

[0046] Once a profile P d = {t d ,, w d ,) is obtained 
for each document d in the search response, the rele- 
vance (or ranking) of document d with respect to the 
chosen user/community/expert profile is obtained by the 
formula: 

Ld prot q 

relevance(d) = ' 

t K ro '-"?) 2 

*i i 

where wf rof are term weights in the selected user, 
community, or expert profile. Then, documents are 
sorted based on their calculated rank values and pre- 
sented to user. 

[0047] Since the profile-based document re-ranking 
takes some time (needed for the documents down-load- 
ing, term extraction and rank calculation), the user may 
request for re-ranking, switch to other activity (or con- 
tinue search) and return back to re-ranked results later. 
Alternatively, the user may request persistent queries, 
when user queries are executed off-line. 
[0048] Note that the term weights w^, are evaluated 
from the document d content only. Since a response 
can actually be a brief description of an original docu- 
ment, the term weights w^,- in this case may be quite dif- 
ferent from the term weights w^,- in the case when the 
full document is available. Although the relevance rank- 
ing is then biased by the profile (through the term wf rof 
), there is a standard tradeoff between the length of the 
response documents and the quality of the ranking: the 
longer the document, the more precise the ranking (and 
the longer it takes to perform the ranking). After each 
document in the search result is ranked, "the results are 
displayed in ranked order to the user. 



[0049] Either kind of user search (with or without re- 
ranking) can lead to finding documents relevant to the 
community. Such documents submitted by the user 
after the search may be used to change or modify the 
5 user profile and consequently the community profile, as 
discussed above. 

[0050] Not all search engines are necessarily exter- 
nal to the search. An internal document collection can 
also be searched with a query. The post-processing in 

w this case is simpler. Indeed, the term frequency vector 
for a document can. be extracted. a prior stored along 
with the document in the collection and reused in the 
relevance ranking each time the document fits a query, 
thus reducing the ranking time. 

is [0051] If a user provides a positive feedback to the 
search result or documents retrieved over the process, 
the search results can be included in the document col- 
lection like any other recommendation. Additionally, the 
search results can be used to modify the user profile by 

20 re-weighting term weights. In such a case, the query 
terms and/or most frequent terms in the response form 
a set. {Rel} of relevant terms. Using this set, the stand- 
ard Rocchio formula for relevance feedback can be 
used. The main difference between the approach 

25 described herein and standard relevance feedback is 
that the approach of the invention does not take into 
account non-relevant terms since this approach does 
not have a reliable way of extracting this kind of informa- 
tion from both document collections and search results. 

30 As a result of re-weighting, the relevant terms from {Rel} 
have their weights increased in the user profile. 
[0052] The search post- processor generally 
requires textual content in order to evaluate the compar- 
ative relevance of the returned items in the context of a 

35 given user or community profile. This means download- 
ing either an abstract (if available) or the entire docu- 
ment. Th entire process could become quite lengthy 
especially if the number of documents returned by the 
query is large. A first step of prefiltering may be neces- 

40 sary in order to prune the list to a manageable number. 
However, while the time cost is high, it should be 
remembered that the time cost to the user of evaluating 
the returned documents him/herself is even higher. In 
many cases, users may be willing to turn collaborative 

45 ranking on, return to other work at hand, and wait until 
an ale riin cheating the collaborative ranking process has 
terminated. As a further incentive to use the collabora- 
tive ranking feature, the items downloaded in the proc- 
ess can be cached locally, so that subsequent browsing 

50 by the user will be much less time-consuming. 

[0053] Document content comes in many formats. 
In order to operate across as many formats as possible, 
the search post-processor will need to be able to con- 
nect to other modules that transform content format into 
55 the search format- Preferably, all formats will be trans- 
formed into ASCII format. Some documents may fall 
outside the system's ability to rank them. The system 
will need to distinguish these from those documents that 
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are ranked in a way meaningful to the user. 
[0054] It will be appreciated that the present inven- 
tion may be readily implemented in software using soft- 
ware development environments that provide portable 
source code that can be used on a variety of hardware s 
platforms. Alternatively, the disclosed system may be 
implemented partially or fully in hardware using stand- 
ard logic circuits. Whether software or hardware is used 
to implement the system varies depending on the speed 
and efficiency requirements of the system and also the w 
particular function and the particular software or hard- 
ware systems and the particular microprocessor or 
microcomputer systems being utilized. 
[0055] The invention has been described with refer- 
ence to a particular embodiment. Modifications and 15 
alterations will occur to others upon reading and under- 
standing this specification taken together with the draw- 
ings. The embodiments are but examples, and various 
alternatives, modifications, variations or improvements 
may be made by those skilled in the art from this teach- 20 
ing which are intended to be encompassed by the fol- 
lowing claims. 

Claims 

25 

1. A system for ranking search results obtained from 
an information retrieval system, comprising 

a search pre -processor, responsive to a search 
query, for determining a context of the search 30 
query in accordance with a predetermined user 
context profile; 

a search engine, responsive to the search 
query, for generating a search result compris- 
ing at least one item obtained from the informa- 35 
tion retrieval system; and 
a search post- processor, responsive to the 
search result, for ranking the item in accord- 
ance with the context of the search query. 

40 

2. The system of claim 1, wherein the user context 
profile comprises one or more of a user profile com- 
prising a set of terms rated by the user, a commu- 
nity profile comprising a set of terms rated by 
members of the community, and a relevance profile 45 
comprising a set of terms generated by a recom- 
mender system. 

3. The system of claim 1 or claim 2, wherein the 
search engine comprises a plurality of individual so 
search engines, each search engine, responsive to 

the search query, generating a search result com- 
prising at least one item obtained from the informa- 
tion retrieval system; and 

wherein the post-processor, responsive to the 55 
search results, ranks the items in accordance with 
the context of the search query. 



4. The system of any of the preceding claims, wherein 
the search pre-processor determines a profile of 
the query P q =(t i q ,w, q ) in accordance with the 
predetermined user profile wherein r/ 7 comprise the 
query terms having term weight, wf*. 

5. The system of any of the preceding claims, further 
comprising a context profiler for generating a con- 
text profile and a document collection comprising a 
set of documents, the context profiler comprising a 
user profiler for constructing a term-weight vector 
for the user, the term-weight vector being extracted 
from each document in the document collection. 

6. The system of any of claims 1 to 4, further compris- 
ing a context profiler for generating a context profile 
and a document collection comprising a set of doc- 
uments, the context profiler comprising a commu- 
nity profiler for constructing a term-weight vector for 
the community, the term-weight vector being 
extractedifrom each document in the document col- 
lection and the community comprising a plurality of 
users u. 

7. The system of claim 6, wherein the community pro- 
file P c - (t jt w c j) comprises the set of terms {tj] 
with their weights w^- for each of the individual 
users u in the community. 

8. The system of claim 7, wherein each member u in 
the community is assigned a weight and for the 
whole community: 

u 



and wherein the weight of term tj in the community 
profile is evaluated as: 

c v~> u 

w r L a u w i ■ 



9. The system of claim 4, wherein the search post- 
processor evaluates each item d in the search 
result and generates a document profile 
P d = (t d jt w d ,) for each item d, where f 3 ; is the 
profile term and the weight of each term. 

10. The system of claim 9, wherein the predetermined 
user context profile comprises a community profile, 
P c =(f w c j) where 
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u 

and W*i is the weight of t t in a profile of user u in the 5 
community. 

11. The system of claim 10, wherein the search post- 
processor determines the relevance of each item d 
in the search result in accordance with: w 



W:XW) X W; 



relevance 



where vfl f is the weight of the term t in a query q t 
and w d is the vector length "projected" on the con- 
text profile and evaluated as 



15. A method of creating a user context profile for use 
in ranking search results obtained from an informa- 
tion retrieval system, comprising: 

providing a document collection comprising a 
plurality of relevant documents; 
assigning a rating to each of the documents in 
the document collection to generate a docu- 
ment profile P d = (t d ,, w d .) for each docu- 
ment in the collection; 

constructing a weight-term vector, wherein the 
weight-term vector includes a portion of the set 
of terms {f^} with their weights to form a 
user profile P u = <*,-, w,-) • 16. The method of 
claim 15, further comprising: 
for each document d returned in response to a 
search query q, generating a document profile, 



20 



evaluating the search 
accordance with: 



document rank 



= .!< 



prof d.2 



25 



relevance 



d prof 



12. A method of ranking search results obtained from 
an information retrieval system, comprising: 

providing a predetermined user context profile; 
generating a search query; 
applying the context profile to the search query 
to generate a user context of the search query; 
generating a search result in response to the 
search query, comprising at least one item 
obtained from the information retrieval system; 
ranking the item in accordance with the context 
of the search query. 

13. The method of claim 12, wherein the user context 
profile comprises one or more of a user profile com- 
prising a set of terms rated by the user from a user 
provided document collection, a community profile 
comprising a set of terms rated by members of the 
community, -and a relevance profile comprising, a 
set of terms generated by a recommender system. 

14. The method of claim 12 or claim 13, further com- 
prising: 

generating a plurality of search results, each 
search result being obtained from the informa- 
tion retrieval system; 

ranking the search results in accordance with 
the context of the search query, and updating 
the predetermined user context profile using 
highly ranked items returned from the search 
query. 



30 



35 



50 



55 



where w^- is the weight of term r in the 
response document d, wf rot is the' weight of 
term r in the user context profile,^/ is tne 
weight of the term t in the query q. and w d is 
the vector length "projected" on the context 
profile and evaluated as 



Wd = |£(wrV) 2 ; and 

N t 



updating the user context profile using the 
highest ranked item returned from the search 
query. 



9 



BNSDCCID. -:EP ... 1050830A2 I > 



EP 1 050 830 A2 




BNSDOCID <EP J050830A2 



10 

BEST AVAILABLE COPY 



EP 1 050 830 A2 




BNSDGCID" <EP 1050830 A2 I. > 



11 

BEST AVAILABLE COPY 



(19) 



3 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



(12) 



(id EP 1 050 830 A3 

EUROPEAN PATENT APPLICATION 



/QQ\ Plato nf ni iMi^ati^r* AO' 
^OO] UaiKi Ul pUUHOdllOn MO. 


ica\ i n *r\7- r^nftc 

(ol) Intel/. vaUOr I r/ou 


i/.U4.^uu^ bulletin juuz/i b 




uaie ot publication a^. 




08.1 1 .2000 Bulletin 2000/45 




(21) Application number: 00303613.4 




/00\ Hotn -f filing- OQ A^ OAAA 

(^^) uate ot Tiling. ^o.U4.^uuu 




(84) Designated Contracting States: 


• Glance, Natalie S. 


AT BE CH CY DE DK ES Fl FR GB GR IE IT LI LU 


38240 Meylan (FR) 


MC NL PT SE 


• Grasso, Antonietta 


Designated Extension States: 


38000 Grenoble (FR) 


ALLTLVMKRO SI 






(74) Representative: Skone James, Robert Edmund 


(30) Priority: 05.05.1999 US 305435 


GILL JENNINGS & EVERY 


Broad gate House 


(71 ) Applicant: Xerox Corporation 


7 Eldon Street 


Rochester, New York 14644 (US) 


London EC2M 7LH (GB) 


(72) Inventors: 




• Chidlovski, Boris 




38240 Meylan (FR) 





(54) System and method for collaborative ranking of search results employing user and group 
profiles 



(57) A system for ranking search results obtained 
from an information retrieval system includes a search 
pre-processor (30), a search engine (20) and a search 
post-processor (40). The search pre-processor (30) de- 
termines the context of the search query by comparing 
the terms in the search query with a predetermined user 
context profile. Preferably, the context profile is a user 



profile or a community profile, which includes a set of 
terms which have been rated by the user, community, 
or a recommender system. The search engine gener- 
ates a search result comprising at least one item ob- 
tained from the information retrieval system. The search 
post-processor (40) ranks each item returned in the 
search result in accordance with the context of the 
search query. 
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